Space-Efficient Indexing of Spaced Seeds for Accurate Overlap Computation of Raw Optical Mapping Data

نویسندگان

چکیده

A key problem in processing raw optical mapping data (Rmaps) is finding Rmaps originating from the same genomic region. These sets of related can be used to correct errors Rmap data, and find overlaps between assemble consensus maps. Previous overlap aligners are computationally very expensive do not scale large eukaryotic sets. We present Selkie , an aligner based on a spaced $(\ell,k)$ -mer index which was pioneered error correction tool Elmeri . Here we space efficient version twice as fast prior art while using just quarter memory human set. Moreover, our for filtering candidates computation, whereas only Rmaps. By combining with exhaustive, but highly accurate, algorithm Valouev et al. (2006), maintains or increases accuracy overlapping bacterial dataset being at least four times faster. Furthermore, dataset, up two orders magnitude faster than previous methods.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Designing Efficient Spaced Seeds for SOLiD Read Mapping

The advent of high-throughput sequencing technologies constituted a major advance in genomic studies, offering new prospects in a wide range of applications.We propose a rigorous and flexible algorithmic solution to mapping SOLiD color-space reads to a reference genome. The solution relies on an advanced method of seed design that uses a faithful probabilistic model of read matches and, on the ...

متن کامل

Fast Computation of Good Multiple Spaced Seeds

Homology search finds similar segments between two biological sequences, such as DNA or protein sequences. A significant fraction of computing power in the world is dedicated to performing such tasks. The introduction of optimal spaced seeds by Ma et al. has increased both the sensitivity and the speed of homology search and it has been adopted by many alignment programs such as BLAST. With the...

متن کامل

SpEED: fast computation of sensitive spaced seeds

SUMMARY Multiple spaced seeds represent the current state-of-the-art for similarity search in bioinformatics, with applications in various areas such as sequence alignment, read mapping, oligonucleotide design, etc. We present SpEED, a software program that computes highly sensitive multiple spaced seeds. SpEED can be several orders of magnitude faster and computes better seeds than the existin...

متن کامل

Efficient Seeds Computation Revisited

The notion of the cover is a generalization of a period of a string, and there are linear time algorithms for finding the shortest cover. The seed is a more complicated generalization of periodicity, it is a cover of a superstring of a given string, and the shortest seed problem is of much higher algorithmic difficulty. The problem is not well understood, no linear time algorithm is known. In t...

متن کامل

PerM: efficient mapping of short sequencing reads with periodic full sensitive spaced seeds

MOTIVATION The explosion of next-generation sequencing data has spawned the design of new algorithms and software tools to provide efficient mapping for different read lengths and sequencing technologies. In particular, ABI's sequencer (SOLiD system) poses a big computational challenge with its capacity to produce very large amounts of data, and its unique strategy of encoding sequence data int...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: IEEE/ACM Transactions on Computational Biology and Bioinformatics

سال: 2022

ISSN: ['2374-0043', '1557-9964', '1545-5963']

DOI: https://doi.org/10.1109/tcbb.2021.3085086